首页> 外文OA文献 >A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model
【2h】

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model

机译:使用链矩阵和前田发音模型通过合成分析对语音进行语音到发音发音转换的研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.
机译:本文通过前田发音模型的合成分析,对元音语音的发音到发音发音的反转进行了定量研究。对于声道(VT)声学的链矩阵计算,要计算相对于面积函数的链矩阵导数,并将其用于准牛顿法中以优化关节运动轨迹。成本函数包括自然和合成的前三个共振峰之间的距离度量,以及参数正则化和连续性项。讨论了使用成本函数将威斯康星大学X射线微束(XRMB)数据库中的Maeda模型校准为两名说话者,一男一女。模型调整包括缩放总体VT和咽部区域,并使用测得的pa和咽迹线修改外部VT轮廓。通过快速搜索发音码本来初始化反转优化,该码本使用XRMB数据进行了修剪以改善反转结果。男性说话人的多个元音和双音在估计的矢状中静脉VT轮廓和测量的XRMB舌头颗粒位置之间达成了良好的一致性,平均颗粒VT轮廓距离约为0.15 cm,平滑的关节运动轨迹,并且第一次的平均误差小于1%三个共振峰。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号